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1, INTRODUCTION 

In computer vision, the automatic visual detection of a thing, as well as tracking, is one most 
thought-provoking disputes at homes, businesses, and industries. The number of visual detection and tracking 
applications are voluminous while those extensive varieties of the applications can be found as well 
inreconnaissance system, vehicle tracking and aerospace applications are the names of a few. By using 
conventional motion estimation methods and pattern recognition, the resolutions for tracking and detection of 
abstract things specifically vehicles, in general, is a delinquent. In one hand, detection of moving things from 
the background image to the continuous video frames are treated as recognition of the moving targets and on 
the other hand, finding various locations of the moving things in a video is treated as tracking of the moving 
targets. To detect and track down those moving objects, it requires process to perform such kind of task. 
Subtraction of two consecutive frames, subtraction of background from frames and optical flow are the main 
of the well-known methods for the moving object detection [1]. In the Optical Flow moving object detection 
procedure [2], the flow field image is calculated and the distribution of the feature is done by cluster 
processing which is better. But this procedure is not suitable for real-time processing because of its large 
amount of calculation and its sensitivity to noise and lack of anti-noise performance [3]. On the other hand, in 
background subtraction procedure [3], the moving object is detected by subtracting the background from the 
current frame, is a simple procedure and in the case of already known background, this process could able to 
provide a complete information about the objects. For the feature extraction, there are many available 
methods such asSpeed-Up Robust Features (SURF) which is a speed-up Version Scale Invariant Feature 
Transforms (SIFT). Many authors ,[1], [4]—[6] utilize certain familiarity to provide the video advancement by 
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defining the relationship between objects with spatial, temporal, and co-occurrence. On the other hand, many 
authors highlight [3], [4], [7] the dispute of finding multiple objects from the continuous sequence of the 
frame as related frames. In [4] shows the tracking of the object and proposed a method to localize the object 
in the image and they showed that it works better with SURF feature extraction and improved Camshift 
algorithm by automatically adjusting the illumination to find out a lost target to track down of the features of 
the object. Moreover, the authors [8], shows the tracking strategy using the same Camshift procedure for the 
PTZ Camera. Nevertheless, this research work did not provide satisfactory output performance for the image 
without strong image texture and their features as shown in Figure 1. 
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Figure |. General Representation of the System 


However [5] showed successfully for the same sort of image objects which also taken into 
consideration of multiple situations. It shows that it will work perfectly if the image object scaled down or up 
and change in rotation in multiple object detection research.However, it did not provide any clear information 
for the 2 or more distinct images where for the research it required different sort of object to be identified and 
it is also not clear on the video segmentation part. Similarly by using Improved KLT Tracking Classifier 
above SURF, [6] simulated tracking the detected object by extracting the features using SURF and track 
down the feature using KLT for multiple objects. In [2], the Demonstrated Kalman channel is utilized to 
scale the condition of an objective question. An optical stream can gauge the speeds of a protest and using 
deformable fragment method [9] confirmed an approach to perceive and track multiple object in CCTV by in 
view of spatial constrains among the objects through the benefits of HOG feature extraction. Conversely, 
[10] exemplifies adaptive fusion centered manifold features while comparison define that the fusion centered 
approach is more stout than the artifact rule and prejudiced sum rule. In [1], the authors analyzed their work 
and they utilizes a static camera for video and the main edge of frame considered as a foundation outline. At 
that point, the foundation was reduced from the current edge of the frame. However, those paper could not 
clearly present examples for reader’s comprehension. Table | listed a summary of literature reviews that are 
being used in this research. 
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Table 1. Related Literatures Review Summary 


Research Title Approach Strengths 
Multiple Object Tracking using Kalman Filter Ce iaan Kalman channel is utilized to scale the condition of an object 
and Optical Flow [2] location. An optical flow can identify the speeds of an object. 
; ; It utilizes a static camera for video and the main edge of frame 
Ee ole Oma Tae ne tO Simulation considered as a foundation outline. At that point, the foundation 


Video Surveillance [1] 


Object tracking using improved Camshift with 


Analytical and 


was reduced from the current edge of the frame. 
SURF feature extraction and improved Camshift algorithm by 
automatically adjusting the illumination to find out lost target to 


SURE aneinoe (74 Srauinon track down of the features of the object 
Traffic Sign Recognition Using SURF 
High rate accuracy to recognize, 
ee Oe eee ee Simulation color-based segmentation along with the morphological and 
Artificial Neural Network Classifier [11] gme 8 TP 8 
geometrical properties 

Multiple object detection for smart TV Successfully simulated for the same sort of image objects which 

; é . also taken into consideration of multiple situations 1.e. if the image 
shopping video using point to point feature Simulation 


based SURF method [5] 


Multiple Object Tracking by Improved KLT 
Tracker Over SURF Features [6] 


2. PROPOSED METHOD 


Analytical and 
Simulation 


object scaled down or up and change in the rotation in multiple 
object detection it works fine 


front facing feature extraction and recognition for multiple objects 


The main objective of this system is to provide robustness and accuracy for the detection of the 
multiple moving objects through developing an algorithm and track the detected object features in the video 


frames. 


Input Video 


Frame Selection, Separation and 
Conversion 


Background Reference Frame In Progress Frame 


Subtract Background Reference 
Frame 


Perform Morphological Filter 
Operation to Clean Foreground 


Detect Moving Object in the 
Frames 


Detect the Features of All 
Objects usng SURF 


Tracking all the objects Feature 





Figure 2. Overview of the Processing Steps 
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1. By using Static cameras it is required to take input video for processing and convert frames to 
images where the first 25frames will be treated as the Background. 
ll. After the last Background training frame, next frame is treated as the in-progress frame and apply 
Background subtraction method through subtracting background reference frame. 
iil. It may contain noise and it must require reducing noise. To reduce noise and to receive clear 
foreground objects, filter using morphological filters and moving objects are detected. 
IV. From the detected objects, extract features each individual by using SURF. 


V. Track the detected features in the video by the k-NN algorithm and in each frame tracking step the 
object features will be set as the old feature. 


3. RESEARCH METHOD 

A model of appearance, model of location Jand a strategy for searching are the three major 
components in any tracking system. For the proposed multiple object detection and tracking we generalize 
the appearance model using for the detection using background subtraction method which is followed by the 
extraction of features by SURF and continuous tracking by the KNN. 


3.1. Speed-Up Robust Features (SURF) 

Scale-Invariant Feature Transform (SIFT) is an effective way to deal with highlight identification 
presented by [12]. The SURF-calculation depends on similar standards and steps, however, it uses an 
alternate plan and it ought to give better outcomes, quicker. With a specific end goal to recognize include 
focuses in a scale-invariant way, SIFT utilizes a falling separating approach whereasthe Difference of 
Gaussians, DoG, is ascertained on continuously downscaled pictures [13]. 


3.2. Keypoint Detection Using SURF 

Generally, the method to accomplish scale invariance is to look at the picture at various scales, scale 
space, utilizing Gaussian pieces. Both SIFT and SURF partitions the scale space into levels and octaves. An 
octave compares to a multiplying of o, and the octave is partitioned into consistently dispersed levels as 
shown in Figure 3. 
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Figure 3: octaves with 3 levels, the area for the 3x3x3 
non-most extreme concealment used to identify 
elements is highlighted from [13]. 


Figure 4: Integral images for Area computation 
from [7] 


Both methodologies assemble a pyramid of reaction maps, with various levels inside octaves. A 
reaction guide is the consequence of an operation on the picture. The intrigue focuses are the focuses that are 
outrageous among 8 neighbors in the present level and its 2x9 neighbors in the level beneath or more. This is 
a non-greatest concealment in a 3x3x3 neighborhood, the connection between levels, octaves, and 
neighborhood is outlined in Figure 4 on top of this segment [13]. 

SURF is described by the utilization of essential pictures. It is described, the counts of the zone of 
an upright rectangular district are lessened to four operations, and the computation of first-request Haar 
wavelet reaction will be six operations. The central image of the image I(x, y) (0<x<M, 0<y<N) can be well- 
defined by the formula: 
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In [14] displays how to achieve reckless pixel intensities, which can be considered by: 


» (BCD) =[) 1(4) +) IBY-L) 1) + D100) (2) 


Gaussian pyramid, 1.e., the picture scale space is principally used to discover intrigue focuses in 
various scales. Here, Gaussian parts can be changed in size to make the Gaussian pyramid. As taking after 
figure appears, Laplacian of Gaussian is approximated to the crate channel. 


Scale 





Figure 5. Intrigue Focus of Laplacian of Gaussian From [13] 


Utilizing this strategy, different layers of the scale-space pyramid can be handled all the while and it 
invalidates the need to subsample the picture, accordingly having better execution. To figure out if a point is 
most extreme, the determinant of Hessian is utilized at the intrigue purposes of restriction. Assume f(x, y) is a 
persistent capacity with two factors, then the Hessian framework is: 











6*f 6°f 
Roe . HF, y)) =detH 
H(F(x,y)) = : (3) and the determinant: rs 7 is ; ma (4) 
f oe i ~ 16x25 y? dxSy dxdy 
Oxdy dy? 


If det HH <0, which means the Eigen values of Hl have different signs, and then the point is not a 
confined maximum. Otherwise it is a maximum and from [10], Replacing f(x, y) with I(x, y), the Hessian 
matrix of the image is: 


Ly (X,0) Lyy(x, 0) ee ad 
HEI =|1 a) Lyy(x0)| © and a=) > 19) 
i=0 j=0 


2 2 
While the value of L,,(x,0) = I(x) * — g(o) and L,.,(x,o) = I(x) * — g(o) 


3.3. Interest point Detection Using SURF 

SURF intrigue point descriptor ascertains the Haar reactions in both X and Y organizes in the circle 
locale focused at intrigue focuses with a sweep of 60. It depends on the predominant introductions of all the 
intrigue focuses. The span of Haar wavelet is 40, and the total of vectors is computed in each 60 degrees in 
the circle. At long last, the introduction with the biggest total of vectors is the overwhelming introduction. 
The procedure appeared in the accompanying figure. 
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Figure 6. Assignment for Orientation From [7] 


After the assurance of overwhelming introduction, [13] describes a square window is built which is 
focused at each intrigue point with a side length of 200. At that point, it is partitioned into 4x4 sub-district 
and the wavelet reaction is figured in both the overwhelming introduction and the introduction vertical to it. 
On the off chance that we characterize the wavelet of x and y as dx and dy, then there will be 4 values Xdx, 
Xdy, X|dx|, X|dy|, and absolutely it will be a 64-length vector for each intrigue point. In this manner, it is 
possible to acquire the descriptor segment by normalizing it. 


3.4. k-Nearest Neighbor (k-NN) Classifier 

K-Nearest Neighbor (KNN starting now and into the foreseeable future) is one of those calculations 
that are exceptionally easy to see however works unimaginably well practically speaking. Additionally, it is 
shockingly adaptable and its applications run from vision to proteins to computational geometry to charts et 
cetera. KNN is a non-parametric sluggish learning calculation. Authors in [15] explained when the method is 
non-parametric, it implies that it doesn't make any suppositions on the hidden information appropriation. This 
is quite valuable, as in this present reality, the vast majority of the down to earth information does not regard 
the ordinary hypothetical suspicions made (e.g. Gaussian blends, directly distinct and so on). 


3.5. Assumptions in the KNN Classifier 

KNN accept that the information is in an element space. All the more precisely, the information 
focuses are in a metric space. The information can be scalars or potentially even multidimensional 
vectors[16]Since the focuses are in highlight space, they have a thought of separation — This need not really 
be a Euclidean separation in spite of the fact that it is the one regularly utilized. 


3.6. KNN for Density Estimation 
In spite of the fact that order remains the essential utilization of KNN, we can utilize it to do 
thickness estimation too. Since KNN is non-parametric, it can do estimation for discretionary disseminations. 
The thought is fundamentally the same as utilization of Parzen window. Rather than utilizing hypercube and 
portion capacities, for evaluating the thickness at a point x, put a hypercube focused at x and continue 
expanding its size till k neighbors are caught. Presently appraise the thickness utilizing the equation, 
isx SY 


k 
ee ee (6) and ic= >» dG. y) 

m i=0 j=0 

Where n is the aggregate number of V is the volume of the hypercube. See that the numerator is 
basically consistent and the thickness is affected by the volume[13]. 


4. RESULTS AND ANALYSIS 

In the following outputs, the results of the simulation for the moving object detection has been 
shown where we particularly used a still camera to record video frames. In the following figures, it is shown 
Background Reference Frame (Figure 7). From the in-progress frame, the background image subtracted to 
detect foreground multiple objects (Figure 8) which indicate the difference between the in-progress original 
frame and the reference background frame. The next image (Figure 9) indicates the morphologically filtered 
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frame of that respective foreground frame for a clear foreground and which is followed by the detection of 
objects. The detected objects are segmented and gather the features for strongest points using SURF. By 
continuously tracking the detected features in the video by the k-NN algorithm, in each step, the new object 
features will be set as the old feature in Figure 10-15. 
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Figure 2. Foreground Frame after Subtraction 
from Figure 7 (May Contain Noise) 
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Figure 3. Side by Side Unclear and Clear image Figure 4. A Moving Object with Strongest Feature 
(after applying Morphological Filter) Points of the Object (using SURF) 
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Figure 5. Strongest Feature Points from Frame 94 Figure 6. Detected Object Feature Matching in 
(using SURF). Frame 94 (using SURF) 
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Figure 95: After Tracking until Frame Number 134 


5. CONCLUSION 

With the multiple occurrences, the maximum multiple object detection works for the same types of 
objects. However, for the multiple object detection and tracking in multiple video files, it is definitely 
required to detect different types of object in the same or the different video files. The objective of this work 
to detect multiple object with same or different types in the same or different video files one by one using 
point using feature point to feature point matching. One of the extreme plus point for using this proposed 
tactic that it can detect the objects notwithstanding a scale alteration or in-plane variation. 
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